Refactor Dataloading by bastitx · Pull Request #13 · Aleph-Alpha-Research/eval-framework

bastitx · 2025-08-26T08:39:34Z

PR Checklist

Use descriptive commit messages.
Provide tests for your changes.
Update any related documentation and include any relevant screenshots.
Check if changes need to be made to docs (README or any guides in /docs/).
Reflect the changes you made in the changelog.

What type of PR is this? (check all applicable)

Description

Move the data loading out of the BaseTask into a Dataloader class. This would make it possible to add additional Dataloader classes to load datasets from other sources.

Related Tickets & Documents

Related Issue #
Closes #

QA Instructions, Screenshots, Recordings

Please replace this line with instructions on how to test your changes, a note
on the hardware and config this has been tested on, as well as any relevant
additional information.

Added/updated tests?

Yes
No, and this is why: please replace this line with details on why tests
have not been included
I need help with writing tests

[optional] Are there any post deployment tasks we need to perform?

src/eval_framework/tasks/benchmarks/gpqa.py

src/eval_framework/tasks/benchmarks/math_reasoning.py

src/eval_framework/tasks/benchmarks/opengptx_eu20.py

FelixReinfurtAA · 2025-08-28T08:55:26Z

src/eval_framework/response_generator.py

        self.result_processor = result_processor
        self.num_samples = config.num_samples
        self.save_intermediate_results = config.save_intermediate_results
+        self.dataloader = HFDataloader()


Would the idea be that we're keeping this hard coded atm and only once new dataloaders would be needed we refactor such that you can inject a dataloader into the ResponseGenerator?

I'm wondering about the requirements: is it that a task can be loaded from HF as well as from some other source by using the same path (just by switching a dataloader) or that certain tasks are to be loaded from HF and some other tasks from some other source?

FelixReinfurtAA · 2025-08-28T08:58:08Z

src/eval_framework/task_names.py

@@ -224,7 +225,7 @@ def _check_no_duplicate_names(cls) -> None:

 def make_sure_all_hf_datasets_are_in_cache() -> None:


Was not part of this PR, but I realized this function is only used in the ci. Maybe a comment about that would be adequate. wdyt?

FelixReinfurtAA · 2025-08-28T08:59:49Z

I really like the abstraction of the dataloader. So far lgtm overall.
Do you plan to incorporate another dataloader as well?

bastitx force-pushed the refactor-dataloading branch 2 times, most recently from 213289d to e942df4 Compare August 26, 2025 09:34

put dataloading into own class

f652d81

bastitx force-pushed the refactor-dataloading branch from e942df4 to f652d81 Compare August 26, 2025 09:37

bastitx added 4 commits August 26, 2025 11:51

set features when loading hf dataset

59dac94

fix hf revision

50406ab

set revision on tablebench

917526e

fix comment

66ad9cf

bastitx force-pushed the refactor-dataloading branch from da5ddb0 to 66ad9cf Compare August 27, 2025 07:20

AhmedHammam-AA force-pushed the main branch from 33d4678 to db30de8 Compare August 27, 2025 11:18

replace set_features by class variable

9aae64d

bastitx force-pushed the refactor-dataloading branch from e3af211 to 9aae64d Compare August 27, 2025 20:53

bastitx and others added 2 commits August 27, 2025 20:54

Merge branch 'main' into refactor-dataloading

1a8ceca

add streaming parameter

46eb08c

bastitx marked this pull request as ready for review August 28, 2025 07:23

bastitx requested a review from a team August 28, 2025 07:23